import numpy as np
import plotly
import plotly.graph_objects as go
import pandas as pd
from datetime import datetime
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sb
plotly.offline.init_notebook_mode()
For my first graph I want to demonstrate how to use plotly to plot the historical prices of a stock using a candlestick chart. I chose to do this because one of my goals for this program is to learn how to apply machine learning methods to financial markets.
For this demonstration, I will be using the historical price data of Unity, which is a software used for game development. I chose Unity simply because I am currently learning how to use Unity.
unity = pd.read_csv("U.csv")
fig = go.Figure(data = [go.Candlestick(x = unity['Date'],
open = unity['Open'],
high = unity['High'],
low = unity['Low'],
close = unity['Close'])])
fig.update_layout(
title = "Unity Stock Prices",
yaxis_title = "Price",
xaxis_title = "Date",
xaxis_rangeslider_visible = False
)
fig.show()
A candlestick can be read as follows:
The Relative Strength Index (RSI) is an indicator that measures how overbought or oversold a stock is. The formula for RSI is $RSI = 100 - [\frac{100}{1 + \frac{n_{up}}{n_{down}}}]$, where $n_{up}$ and $n_{down}$ are the average gains and average losses respectively over the period. A general rule-of-thumb is that when the RSI is above 70, the stock is considered overbought, and it is a signal that the stock price will start going down. Conversely, when the RSI is below 30, the stock is considered oversold, and it is a signla that the stock price will start increasing.
First, we need to write a function that takes in the data of the stock and the reference period, and outputs the RSI values:
def rsi(stockData, period):
open = stockData['Open']
close = stockData['Close']
returns = (close - open)/close
intialReturns = returns[0:period]
avgGain = sum(intialReturns[intialReturns >= 0])/period
avgLoss = abs(sum(intialReturns[intialReturns < 0]))/period
rsi1 = 100 - (100 / (1 + avgGain / avgLoss))
rsIndexes = np.array([])
for i in range(period, len(returns) - 1):
intialReturns = returns[(i - period):i]
avgGain = sum(intialReturns[intialReturns >= 0])/period
avgLoss = abs(sum(intialReturns[intialReturns < 0]))/period
rsIndexes = np.append(rsIndexes, 100 - (100 / (1 + avgGain / avgLoss)))
return rsIndexes
rsiValues = rsi(unity, 10)
Using matplotlib, we can now plot the relative strength index of Unity stock. We will also add horizontal lines at $RSI = 30$ and $RSI = 70$ to indicate when the stock might be oversold or overbought.
dates = unity['Date'][(len(unity['Date']) - len(rsiValues)):len(unity['Date'])] # Getting the dates that correspond with the values in RSI
fig, ax = plt.subplots()
ax.plot(dates, rsiValues)
plt.axhline(y = 70, color = 'g', linestyle = '--', linewidth = 2)
plt.axhline(y = 30, color = 'r', linestyle = '--', linewidth = 2)
plt.title("Relative Strength Index of Unity")
plt.xlabel('Date')
plt.ylabel('RSI')
Text(0, 0.5, 'RSI')
The dates are difficult to read, but we can see that the stock was oversold towards the middle. The stock is also currently overbought, which suggests that the price might go down.
For my next chart, I want to compare the total gains and total losses between Microsoft and AMD. I chose these two stocks because Microsoft is considered to be a low-risk stock, while AMD is considered to be a high-risk stock. Using a bar chart, I want to visually show the differences between profits and losses.
In order to plot this efficiently, we need to organize the data so that the barplot function can read everything easily. To do this, we will:
microsoft = pd.read_csv("MSFT.csv")
amd = pd.read_csv("AMD.csv")
microsoftReturns = (microsoft['Close'] - microsoft['Open']) / microsoft['Open']
amdReturns = (amd['Close'] - amd['Open']) / amd['Open']
microsoftDF = pd.DataFrame(data = {'Symbol': ['MSFT'] * len(microsoftReturns),
'Return': microsoftReturns})
amdDF = pd.DataFrame(data = {'Symbol': ['AMD'] * len(amdReturns),
'Return': amdReturns})
stockData = pd.concat([microsoftDF, amdDF], axis = 0)
stockData = stockData.reset_index(drop = True)
returnDirection = [''] * len(stockData)
for i in range(len(stockData)):
returnDirection[i] = "POSITIVE" if stockData['Return'][i] >= 0 else "NEGATIVE"
stockData['Direction'] = returnDirection
stockData['Return'] = abs(stockData['Return'])
print(stockData)
Symbol Return Direction 0 MSFT 0.001965 POSITIVE 1 MSFT 0.008455 POSITIVE 2 MSFT 0.001570 POSITIVE 3 MSFT 0.021779 NEGATIVE 4 MSFT 0.013074 POSITIVE .. ... ... ... 497 AMD 0.018636 NEGATIVE 498 AMD 0.010653 POSITIVE 499 AMD 0.024834 POSITIVE 500 AMD 0.016601 NEGATIVE 501 AMD 0.043179 NEGATIVE [502 rows x 3 columns]
Now that we have all the columns we need, we move on to graphing the returns.
sb.barplot(data = stockData, x = "Symbol", y = "Return", hue = "Direction", estimator = sum, errorbar = None)
<Axes: xlabel='Symbol', ylabel='Return'>
From the bar graph, we can see that both MSFT and AMD are generally profitable. While AMD has significantly higher total profits, it also has significantly higher total losses. This is why a stock like Microsoft is generally considered low-risk, while a stock like AMD is generally considered high-risk. While both stocks appear to have about the same net-profit (AMD appears to be slightly higher), there is less volatility in Microsoft.
This last plot will not be a finance plot. However, I did want to practice plotting various functions and lines-of-best-fit, especially since we will be studying linear regression shortly.
We will start by generating some data.
def func(x):
y = 3*x + 4
return y
xvals = np.arange(0, 101, 1)
mean = 0
std = 50
yvals = func(xvals) + np.random.normal(mean, std, 101)
fig, lineplot = plt.subplots()
lineplot.plot(xvals, yvals, '.')
lineplot.plot(xvals, func(xvals))
[<matplotlib.lines.Line2D at 0x27b946d9750>]
Above, we generated data centred around a line with slope of 3 and a y-intercept of 4. We then overlayed the actual line with slope 3 and y-intercept 4. When we move on to linear regression, I would like to use the above structure for graphing. However, instead of using a predetermined slope and y-intercept, these values would be calculated using regression.